Skip to content

Conversation

@teh-cmc
Copy link
Member

@teh-cmc teh-cmc commented Dec 2, 2025

LLM summary:

This PR implements encoding and decoding of RRD manifests in footers for the Rerun RRD file format. The changes enable random access to chunks within RRD files by storing metadata about chunk locations and properties in footer manifests.

Key changes:

  • Adds manifest building during encoding, tracking chunk metadata (offsets, sizes, entity paths, etc.)
  • Implements manifest parsing during decoding with transport-to-application conversion
  • Adds CLI support for displaying parsed footers (--footers flag) and recomputing manifests during routing (--recompute-manifests flag)

To which I actually don't have all that much to add.

This PR is basically all the remaining glue so that, whenever one uses our Encoder or one of our Decoder variants, RRD footers and manifests will automagically be computed, injected and serialized/deserialized.

The most important part of this PR is arguably the addition of a footer_roundtrip test, that encodes a recording and then manually decodes all of its chunks directly using the generated RRD manifest, instead of using a Decoder.


Part of RRD footers series of PRs:

@teh-cmc teh-cmc added 📉 performance Optimization, memory use, etc do-not-merge Do not merge this PR include in changelog 🔩 data model Sorbet 🪵 Log & send APIs Affects the user-facing API for all languages dataplatform Rerun Data Platform integration labels Dec 2, 2025
@github-actions
Copy link

github-actions bot commented Dec 2, 2025

Web viewer failed to build.

| Result | Commit | Link | Manifest |
| ------ | ------- | ----- |
| ❌ | | https://rerun.io/viewer/pr/12048 | +nightly +main |

View image diff on kitdiff.

Note: This comment is updated whenever you push a commit.

@github-actions
Copy link

github-actions bot commented Dec 2, 2025

Latest documentation preview deployed successfully.

Result Commit Link
5c6a4f8 https://landing-hurduz571-rerun.vercel.app/docs

Note: This comment is updated whenever you push a commit.

@teh-cmc teh-cmc changed the title RRD footers 2: encoding/decoding manifests RRD footers 3: encoding/decoding manifests Dec 2, 2025
@teh-cmc teh-cmc requested a review from Copilot December 2, 2025 13:41
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements encoding and decoding of RRD manifests in footers for the Rerun RRD file format. The changes enable random access to chunks within RRD files by storing metadata about chunk locations and properties in footer manifests.

Key changes:

  • Adds manifest building during encoding, tracking chunk metadata (offsets, sizes, entity paths, etc.)
  • Implements manifest parsing during decoding with transport-to-application conversion
  • Adds CLI support for displaying parsed footers (--footers flag) and recomputing manifests during routing (--recompute-manifests flag)

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
docs/content/reference/cli.md Documents new CLI flags for footer display and manifest recomputation
crates/top/rerun/src/commands/stdio.rs Updates stream reading to return RRD manifests alongside messages
crates/top/rerun/src/commands/rrd/route.rs Adds --recompute-manifests flag to enable footer generation during routing
crates/top/rerun/src/commands/rrd/print.rs Implements --footers flag to display parsed footer manifests
crates/top/rerun/src/commands/rrd/merge_compact.rs Updates to handle new return signature with footer metadata
crates/top/rerun/src/commands/rrd/filter.rs Updates to handle new return signature with footer metadata
crates/store/re_log_encoding/tests/snapshots/* Adds test snapshots for manifest data and schema validation
crates/store/re_log_encoding/tests/footer_roundtrip.rs Comprehensive roundtrip test for footer encoding/decoding
crates/store/re_log_encoding/src/rrd/log_msg.rs Implements Encodable/Decodable traits for RrdManifest
crates/store/re_log_encoding/src/rrd/encoder.rs Implements manifest building during message encoding
crates/store/re_log_encoding/src/rrd/decoder/stream.rs Adds rrd_manifests() method to async decoder
crates/store/re_log_encoding/src/rrd/decoder/state_machine.rs Accumulates manifests during decoding and implements manifest extraction
crates/store/re_log_encoding/src/rrd/decoder/iterator.rs Adds rrd_manifests() method to sync decoder iterator
crates/store/re_log_encoding/Cargo.toml Adds insta test dependency

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@teh-cmc teh-cmc force-pushed the cmc/rrd_footers_2_rrd_manifests branch 2 times, most recently from 964c086 to c1e6776 Compare December 2, 2025 15:51
@teh-cmc teh-cmc force-pushed the cmc/rrd_footers_3_encoding_decoding branch from f1ad3fb to 92b97b9 Compare December 2, 2025 17:14
}

#[test]
fn footer_roundtrip() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arguably the most important part of this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(really nice test!)

@teh-cmc teh-cmc marked this pull request as ready for review December 2, 2025 17:26
emilk added a commit that referenced this pull request Dec 2, 2025
I'll actually apply it once these PRs are merged:
* #12044
* #12047
* #12048
Copy link
Member

@zehiko zehiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

went through all 3 PRs, it was really nicely documented and easy to follow, from the state machine updated, through rrd manifest builder (which I luckily knew bit about). nice work!

}

#[test]
fn footer_roundtrip() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(really nice test!)

Copy link
Member

@emilk emilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay!

// bit weirder on the other hand, but then again this is generally not a new a
// problem: we tend to perform Sorbet migrations a bit too aggressively all over
// the place. We really need a layer that sits between the transport and
// application layer where one can accessed the parsed, unmigrated data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100%

// Right now, the implemented behavior is that we end up with an empty footer, i.e. there are
// no manifests in it.
// Whether that's the correct behavior is another question, but at least it is defined for now
// and can be changed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love these kind of comments ⭐⭐⭐

@teh-cmc
Copy link
Member Author

teh-cmc commented Dec 3, 2025

@rerun-bot full-check

@github-actions
Copy link

github-actions bot commented Dec 3, 2025

Started a full build: https://github.com/rerun-io/rerun/actions/runs/19887096962

@teh-cmc
Copy link
Member Author

teh-cmc commented Dec 3, 2025

Looking into why the full-check is not passing anymore :notsure:


Oke first problem, why does this code not log ::End messages? Do we have more dormant flush vs. finish bugs laying around? 😬

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let rec = rerun::RecordingStreamBuilder::new("rerun_example_different_data_per_timeline")
        .save("diffdata.rrd")?;

    rec.set_time_sequence("blue timeline", 0);
    rec.set_duration_secs("red timeline", 0.0);
    rec.log("points", &rerun::Points2D::new([(0.0, 0.0), (1.0, 1.0)]))?;

    // Log a red color on one timeline.
    rec.reset_time(); // Clears all set timeline info.
    rec.set_duration_secs("red timeline", 1.0);
    rec.log(
        "points",
        &rerun::Points2D::update_fields().with_colors([0xFF0000FF]),
    )?;

    // And a blue color on the other.
    rec.reset_time(); // Clears all set timeline info.
    rec.set_time_sequence("blue timeline", 1);
    rec.log(
        "points",
        &rerun::Points2D::update_fields().with_colors([0x0000FFFF]),
    )?;

    // TODO(#5521): log VisualBounds2D

    Ok(())
}

TODO: what about C++?


Python is the one actually failing, presumably because it's the only one actually trying to log an End message ??

pixi run -e examples python docs/snippets/all/concepts/different_data_per_timeline.py
DEV ENVIRONMENT DETECTED! Re-importing rerun from: /home/cmc/dev/rerun-io/rerun/rerun_py/rerun_sdk
[2025-12-03T09:43:31Z ERROR re_log_encoding::rrd::file_sink] Failed to end log stream for diffdata_py.rrd: Invalid chunk: Detected malformed Chunk: global index 'Field { name: "red timeline", data_type: Duration(Nanosecond), nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {"rerun:index_name": "red timeline", "rerun:kind": "index"} }' does not have matching `:start` field

rerun rrd migrate yields the same thing, so I basically broke it all at some point, somehow:

pixi run rerun rrd migrate diffdata.rrd
[2025-12-03T09:46:26Z WARN  re_log_encoding::rrd::encoder] encoder couldn't be finished: Invalid chunk: Detected malformed Chunk: global index 'Field { name: "blue timeline", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {"rerun:index_name": "blue timeline", "rerun:kind": "index"} }' does not have matching `:start` field
✅ 1 file(s) successfully migrated.

Interestingly, migrating real stuff like pixi run rerun rrd migrate data/footed/Wed_May_17_14:24:01_2023.rrd works no problem, so it's probably just a bug in the sanity check logic that gets hit for that particular snippet (whitespaces being problematic would be my guess).


Also: do we rerun rrd compare footers? We should really compare footers. Maybe not in this PR though.

@teh-cmc
Copy link
Member Author

teh-cmc commented Dec 3, 2025

@rerun-bot full-check

@github-actions
Copy link

github-actions bot commented Dec 3, 2025

@teh-cmc
Copy link
Member Author

teh-cmc commented Dec 3, 2025

Alright, whitespaces in index names was the answer, ofc it was.

(Maybe, not confirmed) This unraveled some other unrelated flushing bugs too, it seems. I'll have to look at those in follow ups.

@teh-cmc teh-cmc removed the do-not-merge Do not merge this PR label Dec 3, 2025
teh-cmc added a commit that referenced this pull request Dec 3, 2025
LLM summary:
> This PR introduces the foundational framing infrastructure for RRD
stream footers, which will enable richer metadata and improved stream
navigation. It establishes the protocol-level types and state machine
changes needed to support footers without yet populating them with
content.
> 
> Key changes include:
> - Addition of `RrdFooter` message types at both transport (protobuf)
and application levels
> - Introduction of `StreamFooter` frame for locating and validating RRD
footers
> - Enhanced decoder state machine to handle the new footer frames
> - Updated encoder to emit footers with basic state tracking
infrastructure

To which I would add the following:

This introduces all the framing infrastructure so that RRD streams are
now capable of carrying footers in all circumstances (including pipes).

Specifically, this introduces a couple new types:
* `RrdFooter`, a(n empty) protobuf message that will from now on act as
the payload for messages of type `MessageKind::End`.
* `MessageKind::End` isn't new, it's something that was always there,
but until now was always empty.
* As always, this has both a cheap transport-level definition and a less
cheap app-level definition.
* `StreamFooter`, a simple binary frame that mirrors `StreamHeader`, and
whose main job is to keep track of where the `RrdFooter`.
* This is used for O(1) access to the `RrdFooter`, e.g. when registering
data on a Redap-compliant server.

For now, these footers are always an empty protobuf messages. We will be
filling them up in the following PRs.

---

Part of RRD footers series of PRs:
* #12044
* #12047
* #12048
* rerun-io/dataplatform#2060


## TODO
* [x] `@rerun-bot full-check`

---------

Co-authored-by: Copilot <[email protected]>
@teh-cmc teh-cmc force-pushed the cmc/rrd_footers_2_rrd_manifests branch from eae5d66 to fd1e2ef Compare December 3, 2025 10:42
teh-cmc added a commit that referenced this pull request Dec 3, 2025
LLM summary:
> This PR introduces RRD manifest support as the second part of the RRD
footers series. The manifest provides an index of chunks in an RRD
recording, enabling efficient query planning without loading actual
data. Key changes include:
> - Extending `RrdFooter` and adding `RrdManifest` protobuf definitions
> - Implementing `RrdManifest` and `RrdManifestBuilder` to catalog
chunks with their metadata (ID, size, offset, timeline ranges)
> - Adding transport layer conversions between application-level and
protobuf types
> - Refactoring schema hash computation to a shared utility method

To which I'll add:

This PR introduces the core datastructure that this whole footer
business is all about: the `RrdManifest`.

The `RrdManifest` is a protobuf message (which, as always, comes with
both a transport-level and application-level implementations) which
carries various important metadata about the associated recording.
The most important piece of metadata is the actual manifest
record-batch, which lists all the chunks in the recording at at
fine-grained enough level of detail that relevancy queries become
possible without ever having to load the data in a local chunk-store
first.
Effectively, you can view record-batch is a dataframe representation of
the time panel.

There is a lot of code, but it's all fairly trivial and mechanical,
bordering on boilerplate in some cases.
In fact, what you should really focus on is the fact that this code:


https://github.com/rerun-io/rerun/blob/ad754fab94517e80866f7911164a7622b932de74/crates/store/re_log_encoding/tests/footers_and_manifests.rs#L116-L173

yields this manifest record-batch:

<img width="1137" height="1468" alt="image"
src="https://github.com/user-attachments/assets/c7548fb8-4ea0-47fd-98c6-3783c2095e85"
/>

with this schema:


https://github.com/rerun-io/rerun/blob/ad754fab94517e80866f7911164a7622b932de74/crates/store/re_log_encoding/tests/snapshots/footers_and_manifests__simple_manifest_batch_schema.snap#L5-L117

---

Part of RRD footers series of PRs:
* #12044
* #12047
* #12048
* rerun-io/dataplatform#2060

---------

Co-authored-by: Copilot <[email protected]>
Base automatically changed from cmc/rrd_footers_2_rrd_manifests to main December 3, 2025 10:43
@teh-cmc teh-cmc force-pushed the cmc/rrd_footers_3_encoding_decoding branch from 180e33e to 6612d8f Compare December 3, 2025 11:00
@teh-cmc
Copy link
Member Author

teh-cmc commented Dec 3, 2025

Okay the diff is back to looking sane, not sure why git-rebase went completely haywire on this one 😑

@teh-cmc teh-cmc merged commit 863940c into main Dec 3, 2025
32 of 34 checks passed
@teh-cmc teh-cmc deleted the cmc/rrd_footers_3_encoding_decoding branch December 3, 2025 11:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🔩 data model Sorbet dataplatform Rerun Data Platform integration include in changelog 🪵 Log & send APIs Affects the user-facing API for all languages 📉 performance Optimization, memory use, etc

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants